Skip to content

hexagon: make vmem and buffer-size configurable#22487

Merged
max-krasnyansky merged 9 commits intoggml-org:masterfrom
qualcomm:hexagon-auto-vmem
Apr 29, 2026
Merged

hexagon: make vmem and buffer-size configurable#22487
max-krasnyansky merged 9 commits intoggml-org:masterfrom
qualcomm:hexagon-auto-vmem

Conversation

@max-krasnyansky
Copy link
Copy Markdown
Member

Overview

This PR adds two new knobs to the Hexagon backend

  • GGML_HEXAGON_VMEM
    Allows for overriding default VMEM limit. The default is the same as before (around 3.2GB)
    If set to 0 the backend will try to measure it by pre-mmaping the buffers
  • GGML_HEXAGON_MBUF
    Allows for overriding default buffer. The default is the same as before (1GB)
    This might be handy on the IOT devices where the allocator might struggle with 1GB DMA buffers

I also streamlined mapping management a bit further (pinned mappings are now managed directly by the host, etc) and updated logging in the related areas.

Requirements

@max-krasnyansky max-krasnyansky requested a review from a team as a code owner April 28, 2026 21:21
@github-actions github-actions Bot added script Script related ggml changes relating to the ggml tensor library for machine learning Hexagon labels Apr 28, 2026
@max-krasnyansky
Copy link
Copy Markdown
Member Author

@ggml-org/maintainers can I get a second approval please.

Comment thread scripts/snapdragon/adb/run-cli.sh
@max-krasnyansky
Copy link
Copy Markdown
Member Author

@lhez can you please re-approve (stale again after merging suggestions)

@max-krasnyansky max-krasnyansky merged commit 41a63be into ggml-org:master Apr 29, 2026
50 checks passed
tekintian added a commit to tekintian/llama.cpp that referenced this pull request May 1, 2026
* 'master' of github.com:tekintian/llama.cpp: (659 commits)
  ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_ID (ggml-org#22464)
  Update llama-mmap to use ftello/fseeko (ggml-org#22497)
  common : check for null getpwuid in hf-cache (ggml-org#22550)
  vulkan: add get/set tensor 2d functions (ggml-org#22514)
  spec: fix argument typo (ggml-org#22552)
  ci : bump ty to 0.0.33 (ggml-org#22535)
  vendor : update cpp-httplib to 0.43.2 (ggml-org#22548)
  CUDA: fix tile FA kernel on Pascal (ggml-org#22541)
  scripts : add wc2wt.sh - create worktree from current HEAD (ggml-org#22513)
  add fast matmul iquants (ggml-org#22504)
  spec : fix draft model checkpoints (ggml-org#22521)
  spec : fix vocab compat checks in spec example (ggml-org#22426)
  common : do not pass prompt tokens to reasoning budget sampler (ggml-org#22488)
  hexagon: make vmem and buffer-size configurable (ggml-org#22487)
  CUDA: fuse SSM_CONV + ADD(bias) + SILU (ggml-org#22478)
  spec : disacard last drafted token with low prob (ggml-org#22506)
  sync : ggml
  ggml : bump version to 0.10.1 (ggml/1469)
  webui: fix slow mic stop and WAV encode (ggml-org#22480)
  ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (ggml-org#22293)
  ...

# Conflicts:
#	.gitignore
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* hexagon: allow host to set max vmem size

We use a sane default but it's helpful to allow for an override if needed.

* hexagon: add support for measuring vmem space and move pinned mmaping management to host

* hexagon: update vmem checks to use uint64

* hexagon: bump op buffers to 16 (matches max mmaps)

* hexagon: bump default vmem to 3.2GB

* hexagon: add support for autodetecting vmem space and some logging cleanup in that area

* hexagon: fix whitespace warnings

* Update scripts/snapdragon/adb/run-cli.sh

Co-authored-by: Pascal <admin@serveurperso.com>

* hex-adb: fix run-completion script

---------

Co-authored-by: Pascal <admin@serveurperso.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Hexagon script Script related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants